W3: Data Visualization

Announcements

Last Week Today

Clear Points

implicit subsetting

today was pretty clear, i need to figure out the syntax for diving into specific parts of data frames but I think I need to just play with this.

Muddy Points

implicit subsetting

We will revisit this when we get to data wrangling in a couple weeks

Can vectors be thought of as data columns or are they sometimes rows or neither?

In R, the columns of a data.frame are vectors.

When to use $

We use $ to refer to a column within a data.frame. One of the joys of the tidyverse is that you don’t have to use it.

Muddy Points

Nothing muddy but i think since vectors are such a crucial concept to learn - live coding with a simpler data set would be easier. Sometimes the medical data can go over my head. …

Thanks for the feedback. There is a preview of the data we’re using today in a slide.

Muddy Points

Not necessarily unclear, but it would be helpful to do a quick rundown of any tips to remember the ordering for how to recall information and also the ordering of arguments for other functions.

For the most part, we’ve been using the order to distinguish our arguments. You can mix the order up by using the argument names. These are all equivalent:

seq(1,10,2)
seq(from=1, to=10, by=2)
seq(by=2, from=1, to=10)

When working with functions, auto-completion is your friend.

Data Visualization

Penguins Dataset

gt::gt(head(penguins))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
  • Note that our dataset has column names
  • In ggplot2, we don’t need to use the $ operator: penguins$species
  • We use the bare column name to refer to it: species
    • bill_depth_mm : numeric
    • bill_length_mm : numeric
    • species : character

{visdat} for Exploratory Data Analysis

library(visdat)
vis_dat(penguins)

Common Plots

One Variable

  • Numeric: histogram
  • Character: bar plots

Two Variables

  • Numeric vs. Numeric: Scatterplot, line plot
  • Numeric vs. Character: Box plot

Why focus on these plots?

We build a plot one part at a time

Data +

Mapping to data +

Geometry

Think about making plots like using recipes from a cookbook: https://r-graphics.org/

One variable plots

Building a Histogram

ggplot(penguins) +

aes(x = bill_length_mm) +

geom_histogram()

Data +

Mapping to data +

Geometry

ggplot(penguins)

ggplot(penguins) +

  • We always start with ggplot()
  • The first argument to ggplot() is the data
  • We add details to the plot with the + (plus sign)

aes():

aes(x = bill_length_mm) +

  • We map data in with the aes() function
  • x is an aesthetic - it maps data to a visual property
  • In the aes() function, we use bare column names: bill_depth_mm
  • If you want to know what aesthetics to map, look at the geom documentation:

geom_histogram()

geom_histogram()

  • All geometries begin with geom_
  • geom_s require specific aesthetics
  • When in doubt, look at the documentation:
    • ?geom_histogram

Taking it one part at a time

ggplot(penguins)

Taking it one part at a time

ggplot(penguins) +
  aes(x = bill_length_mm)

Taking it one part at a time

ggplot(penguins) +
  aes(x = bill_length_mm) +
  geom_histogram()

Histogram recap

ggplot(penguins) +

aes(x = bill_length_mm) +

geom_histogram()

Bar plots

Made for categorical data. Bar plots automatically count each group for you, so you only need to provide one variable (axis).

ggplot(penguins) +

aes(x = species) +

geom_bar()

2 Variable Plots

Scatterplot

ggplot(penguins) +

aes(x = bill_length_mm, y = bill_depth_mm) +

geom_point()

Scatterplot (data)

ggplot(penguins)

Scatterplot (aesthetics)

ggplot(penguins) +
  aes(x = bill_length_mm, 
      y=bill_depth_mm) 

Scatterplot (geometry)

ggplot(penguins) +
  aes(x = bill_length_mm, 
      y=bill_depth_mm) +
  geom_point()

Note: Where to put aes()

Our code looks like this:

ggplot(penguins) +
  aes(x = bill_length_mm, y=bill_depth_mm) +
  geom_point()

Most ggplot code looks like this:

ggplot(penguins, mapping = aes(x = bill_length_mm, y=bill_depth_mm)) +
  geom_point()

Either is acceptable!

What about more than two variables?

Three Variables

ggplot(penguins) +

aes(x = bill_length_mm, y = bill_depth_mm, color = species) +

geom_point()

Additions to Basic Plots

Histogram with a plot theme

ggplot(penguins) +

aes(x = bill_length_mm) +

geom_histogram() +

theme_bw()

Histogram with options

ggplot(penguins) +

aes(x = bill_length_mm) +

geom_histogram(binwidth = 5)

Boxplot

ggplot(penguins) +

aes(x = species, y = bill_depth_mm) +

geom_boxplot()

Faceting

ggplot(penguins) +

aes(x = species, y = bill_depth_mm, color = species) +

geom_boxplot() +

facet_wrap(~island)

Multivariate Scatterplot by facet

ggplot(penguins) +

aes(x = bill_length_mm, y = bill_depth_mm) +

geom_point() + facet_wrap(~species)

Some additional options

ggplot(data = penguins) +

aes(x = bill_length_mm, y = bill_depth_mm, color = species) +

geom_point() +

labs(x = “Bill Length”, y = “Bill Depth”, title = “Comparison of penguin bill length and bill depth across species”)

Layering Geometries

geom_tile() + geom_text() = heatmap

Why is this heatmap missing boxes? Hint: look at penguin counts.

Look at the count() function and see if there’s an argument we can set to fill in the missing boxes.

penguin_counts <- count(x=penguins, species, island)
penguin_counts
# A tibble: 5 × 3
  species   island        n
  <fct>     <fct>     <int>
1 Adelie    Biscoe       44
2 Adelie    Dream        56
3 Adelie    Torgersen    52
4 Chinstrap Dream        68
5 Gentoo    Biscoe      124

Missing Values

ggplot(penguin_counts) +
  aes(x=species, 
      y=island, 
      fill=n) +
  geom_tile() +
  geom_text(aes(label=n), 
            color="white")

esquisse as a helper

Consider the esquisse package to help generate your ggplot code via drag and drop.

library(esquisse)

esquisser(penguins)

For More Practice:

R Graphics Cookbook

An excellent resource: https://r-graphics.org/